Blog Scraping
   HOME

TheInfoList



OR:

{{unreferenced, date=May 2008 Blog scraping is the process of scanning through a large number of
blog A blog (a truncation of "weblog") is a discussion or informational website published on the World Wide Web consisting of discrete, often informal diary-style text entries (posts). Posts are typically displayed in reverse chronological order s ...
s, usually through the use of automated software, searching for and copying content. The software and the individuals who run the software are sometimes referred to as blog scrapers. Blog scraping is copying a blog, or blog content, that is not owned by the individual initiating the scraping process. If the material is copyrighted it is considered
copyright infringement Copyright infringement (at times referred to as piracy) is the use of works protected by copyright without permission for a usage where such permission is required, thereby infringing certain exclusive rights granted to the copyright holder, s ...
, unless there is a license relaxing the copyright or the country has fair-use or private use law. The scraped content is often used on spam blogs or
splog A spam blog, also known as an auto blog or the neologism splog, is a blog which the author uses to promote affiliated websites, to increase the search engine rankings of associated sites or to simply sell links/ads. The purpose of a splog can be ...
s, such places are called
scraper sites A scraper site is a website that copies content from other websites using web scraping. The content is then mirrored with the goal of creating revenue, usually through advertising and sometimes by selling user data. Scraper sites come in various f ...
.


Issues

A blog scraper who gathers content that is
copyright A copyright is a type of intellectual property that gives its owner the exclusive right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time. The creative work may be in a literary, artistic, educatio ...
ed material can be considered in violation of the law, depending on the case, data usage and country. Blog scraping can create problems for the individual or business who owns the blog. Blog scraping is particularly worrisome for business owners and business bloggers. Scrapers can copy an entire post from an independent or business blog. The duplicated content will include the author's tag and a link back to the author's site (if that link appears in the author's tag). However, most blog scrapers copy only a portion of the content that is keyword-relevant to their splog topic. By doing this, the keyword relevancy of the scraper's site is increased. Secondly, by not scraping the entire post, any outbound links are eliminated which means their search engine ranking is not reduced. Additionally, scraped content can appear on literally any type of splog or
RSS RSS ( RDF Site Summary or Really Simple Syndication) is a web feed that allows users and applications to access updates to websites in a standardized, computer-readable format. Subscribing to RSS feeds can allow a user to keep track of many di ...
-fed spam site. This means an unsuspecting individual could find their creative or copyrighted material copied onto a site promoting pornography or similar type of content that may be offensive to the original author and his/her audience. This may be damaging to the original author's reputation.


References


External links


WordPress Feed Copyrighter PluginSix Steps to Prevent Content Theft and Combat Copyright Infringement on Your Business BlogDefending your site from blog scrapers
Blog software Web scraping